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INTRODUCTION: 


This  work  involves  the  development  of  statistical  methodology  for  the  analysis  of  multiple  outcome  data. 
The  goal  of  this  work  is  to  extend  the  current  statistical  methodology,  in  particular,  the  method  proposed  by 
Wei,  Lin  and  Weissfeld  (1989).  In  the  proposed  methodology,  the  standard  Cox  model  used  in  this 
multiple  outcome  procedure  is  replaced  with  a  spline  based  version  of  the  Cox  model  that  was  proposed  by 
Gray  (1992).  The  advantage  of  this  approach  is  that  researchers  obtain  a  detailed  description  of  the 
relationship  between  survival  time  and  a  covariate  that  is  not  available  using  the  standard  Cox  regression 
model. 

BODY: 

The  work  included  in  the  statement  of  work  involves  several  components,  the  development  of  flexible 
marginal  models  for  multiple  time  to  event  data  using  penalized  B-spline  based  models,  to  extend  these 
models  using  psuedosplines,  and  the  development  of  regression  diagnostics  and  goodness-of-fit  tests  for 
these  models.  Progress  has  been  made  on  each  of  these  aims. 

The  investigators.  Dr.  Kiros  Berhane  and  Dr.  Lisa  Weissfeld,  are  at  the  University  of  Southern  California 
and  the  University  of  Pittsburgh,  respectively.  There  is  a  graduate  student  researcher  at  each  site  who 
works  closely  with  the  faculty  mem^r.  These  individuals  are  Mr.  Zekarias  Berhane  at  the  University  of 
Pittsburgh  and  Ms.  Maria  Faccuseh  at  the  University  of  Southern  California.  There  have  been  two 
meetings  between  the  investigators  over  the  past  year.  The  first  meeting  took  place  in  March  when  Dr. 
Berhane  visited  the  University  of  Pittsburgh  and  the  second  meeting  took  place  in  August  at  the  Joint 
Statistical  Meetings  in  Indianapolis.  The  Pittsburgh  meeting  was  used  to  work  on  software  development,  to 
discuss  inferential  procedures  for  the  proposed  methodology,  and  to  meet  with  Dr.  Costantino,  the  NSABP 
investigator  who  is  affiliated  with  the  project.  The  meeting  in  August  was  used  to  set  priorities  and  goals 
for  the  upcoming  6  months.  The  graduate  student  researcher  from  the  University  of  Pittsburgh  was  also  at 
both  of  these  meetings. 

Throughout  much  of  the  academic  year  a  research  group  examining  the  use  of  spline  based  siuvival  models 
was  formed.  This  group  of  University  of  Pittsburgh  researchers  consists  of  Dr.  Weissfeld,  Dr.  Joyce 
Chang,  Dr.  Jeong  and  three  Ph.D.  students  who  are  working  with  Dr.  Weissfeld.  Dr.  Chang  did  much  of 
the  work  on  residual  analysis  that  will  be  extended  to  the  spline  based  model  setting.  Dr.  Jeong  is  an 
NSABP  researcher  and  will  also  help  with  the  analysis  of  NSABP  BCPT  data.  This  group  met  weekly  and 
discussed  literature  in  the  area  of  spline  based  models. 

Specific  Aim  1: 

The  goal  of  this  aim  is  to  develop  flexible  marginal  models  for  multiple  time  to  event  data  using  penalized 
B-spline  based  models.  We  have  completed  the  theoretical  development  of  the  model  and  justified  the 
me^ods  for  inference.  This  work  is  presented  in  the  attached  paper.  We  now  have  preliminary  software  to 
implement  these  models.  The  development  of  the  software  has  t^en  considerable  time.  At  this  point  in 
time  we  have  a  PC-based  program  that  we  are  using.  The  development  of  this  PC-based  software  is  key 
since  Dr.  Weissfeld  and  Dr.  Berhane  are  at  two  different  locations  with  Dr.  Weissfeld  being  at  the 
University  of  Pittsburgh  and  Dr.  Berhane  being  at  the  University  of  Southern  California.  The  initial  work 
in  this  area  involved  the  use  of  a  UNIX  based  program  that  required  access  to  either  a  Sun  Work  Station  of 
a  UNIX  based  mainframe.  Dr.  Robert  Gray,  who  wrote  the  original  program  kindly  provided  us  with  a 
Windows  version  of  the  software  in  early  2000.  We  now  have  a  preliminary  version  of  the  program  for 
multiple  time  to  event  data,  which  we  are  in  the  process  of  testing.  We  are  dso  in  the  process  of  testing  the 
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software  for  the  simulation  study.  Programs  are  e-mailed  between  Drs.  Weissfeld  and  Berhane  and  the 
graduate  student  researchers  who  are  also  working  on  the  project. 

We  are  currently  able  to  simulate  data  from  a  bivariate  exponential  distribution  and  are  in  the  process  of 
finishing  a  second  routine  for  the  generation  of  data  from  a  bivariate  exponential  distribution  that  was 
proposed  by  Sarkar.  We  expect  that  this  aspect  of  the  work  will  be  completed  shortly  so  that  the 
simulation  portion  can  be  added  to  the  attached  draft  of  the  paper.  We  are  also  in  the  process  of  requesting 
a  data  set  from  the  NSABP  BCPT  and  should  have  a  data  set  shortly.  We  are  approximately  2  to  3  months 
behind  schedule  on  the  work  on  this  aim. 

Specific  Aim  2: 

The  goal  of  this  aim  is  to  develop  flexible  marginal  models  for  multiple  time  to  event  data  using 
pseudospline  based  models  for  time  to  event  data.  We  have  completed  the  theoretical  development  of  this 
model  and  the  justification  of  the  proposed  inferential  procedures.  We  are  in  the  process  of  developing 
software  to  implement  the  model.  The  software  development  is  nearing  completion  for  this  part  of  the 
project.  Dr.  Berhane  and  his  graduate  student  researcher  at  USC  have  worked  on  this  intensively  over  the 
past  month  because  of  the  graduate  student  researcher’s  decision  to  leave  USC.  The  work  done  on  the 
software  will  be  linked  with  work  that  has  been  done  at  the  University  of  Pittsburgh.  The  graduate  student 
researcher  at  the  University  of  Pittsburgh  is  very  familiar  with  the  program  and  the  work  that  needs  to  be 
done  to  see  the  project  through  to  completion.  The  simulation  programs  written  for  Aim  1  will  apply 
directly  to  simulation  from  this  model  as  well  so  that  new  software  development  is  not  necessary  for  this 
phase  of  the  simulation  study. 

The  proposed  work  on  this  aim  is  well  ahead  of  schedule  and  should  be  completed  within  the  next  several 
months. 


Specific  Aim  3: 

We  have  begun  work  on  the  development  of  regression  diagnostics  for  this  model.  Zekarias  Berhane,  the 
graduate  student  researcher  based  at  the  University  of  Pittsburgh,  will  work  on  the  development  of 
regression  diagnostics  for  these  models  as  part  of  his  dissertation  work.  He  has  begun  to  work  on  the 
review  of  the  literatiue  in  this  area.  He  is  currently  spending  time  reviewing  the  dissertation  work  of  Dr. 
Joyce  Chang,  which  was  used  as  the  springboard  for  this  specific  aim.  Work  on  this  aim  is  a  bit  ahead  of 
schedule. 


Specific  Aim  4: 


The  work  on  this  aim  is  related  to  that  of  aim  3 .  We  have  not  begun  the  literature  review  for  this  work  and 
have  instead  focused  on  pushing  the  work  on  aim  2  forward. 


KEY  RESEARCH  ACCOMPLISHMENTS: 

The  key  research  accomplishments  to  date  from  this  work  are: 

•  a  preliminary  version  of  a  program  for  multiple  outcomes  using  a  spline  based  model. 

•  a  preliminary  version  of  a  program  for  multiple  outcomes  using  a  pseudo-spline  based  model 

•  software  to  simulate  data  from  bivariate  exponential  distributions 

•  several  new  lines  of  research  that  will  be  pursued  as  a  result  of  this  work:  an  extension  of  the  model  to 
handle  recurrent  event  data,  an  analysis  of  the  NSABP  BCPT  data  using  the  method  of  Wei,  Lin  and 
Weissfeld  (1989). 
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REPORTABLE  OUTCOMES: 


•  a  draft  manuscript  for  the  spline  based  model  is  attached 

•  a  manuscript  for  the  pseudospline  model  is  currently  under  development 


CONCLUSIONS: 

This  work  will  provide  researchers  with  another  tool  to  analysis  multiple  outcome  survival  data.  The  real 
advantage  of  this  method  is  that  it  will  allow  researchers  to  examine  the  efiect  of  a  covariate  over  the 
course  of  the  study  rather  than  relying  on  the  “average”  measure  that  is  provided  by  the  Cox  proportional 
hazards  model.  Two  changes  occurred  in  the  plan  of  the  project  over  the  first  year:  software  was  moved 
from  the  mainframe  to  a  windows-based  PC  program  and  greater  emphasis  was  placed  on  Aim  2  due  to  an 
anticipated  change  in  the  graduate  student  researcher  at  the  University  of  Southern  California.  Because  of 
this  part  of  the  work  on  Aim  1  was  not  finished.  The  work  plan  for  the  coming  year  will  essentially  follow 
that  proposed  in  the  grant,  with  the  work  on  Aim  1  being  completed  shortly. 
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Abstract 


Penalized  B-splines  have  been  applied  to  time-to-event  data,  providing  an  extension  of 
the  proportional  hazards  model  for  a  single  outcome  (Gray,  1994).  We  use  this  technique 
to  extend  the  marginal  models  of  Wei,  Lin  and  Weissfeld  (1989).  This  allows  for  greater 
flexibility  in  modeling  the  margins  and  makes  formal  development  of  inferential  procedures 
possible.  This  method  is  illustrated  with  an  example  using  data  from  the  NSABP  Breast 
Cancer  Prevention  Trial. 

KEY  WORDS:  Survival  analysis;  Smoothing;  Ridge  regression;  Additive  models;  Splines. 
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1  Introduction 


The  advent  of  promising  drugs  like  tamoxifen  in  the  treatment  and/or  prevention  of  breast 
cancer  has  ignited  both  hope  and  controversy  in  the  scientific  world  and  the  general  public. 
The  controversy  revolves  around  the  adverse  side  effects  of  tamoxifen  (ref.),  some  details 
about  the  NSABP-BCPT  In  order  to  demonstrate  the  positive  or  negative  effectiveness  of 
tamoxifen,  one  needs  to  compare  the  advantages  of  the  drug  to  its  disadvantages  in  a  simul¬ 
taneous  and  comprehensive  manner.  To  do  this,  one  needs  to  be  able  to  make  simultaneous 
inferecne  on  several  time-to-event  outcomes  and  also  be  able  to  flexibly  model  the  effect  of 
risk  or  prognostic  factors  that  have  non-linear  effects.  Considerable  progress  has  been  made 
over  the  years  in  the  development  of  models  that  handle  multiple  time-to-event  outcome  data 
and  models  that  allow  for  flexible  modeling  of  effects  of  prognostic  factors  for  sigle  time-to- 
event  outcome.  But,  to  date,  flexible  methods  do  not  exist  that  allow  for!  simultaneous 
inferenc  e  of  multiple,  or  recurrent,  time-to-event  outcomes. 

The  proportional  hazards  model  (Cox  1972)  has  received  considerable  attention  as  a 
popular  way  of  modeling,  possibly  censored,  time-to-event  data.  In  addition  to  the  propor¬ 
tionality  of  the  hazards,  the  model  assumes  that  the  effects  of  the  predictors  (risk  factors)  on 
the  response  follow  a  parametric  (mostly  linear)  form.  Recently,  this  assumption  has  been 
relaxed  to  allow  for  data-dependent,  and  possibly  non-linear,  covariate  effects  by  exploiting 
the  flexibility  of  nonparametric  regression  techniques  (Hastie  and  Tibshirani  1990).  Fully 
non-parametric  proportional  hazards  models  (O’Sullivan  (1988)  and  Hastie  and  Tibshirani 
(1990)),  while  attractively  flexible,  usually  suffer  from  heavy  computational  load  and  lack  of 
formal  inferential  procedures.  Gray  (1994)  used  the  concept  of  pseudo-smoothers,  with  em¬ 
phasis  to  penalized  B-splines,  to  develop  formal  inference  for  proportional  hazards  models. 
Penalized  B-splines  provide  an  elegant  compromise  between  regression  splines  and  smo!  othi 
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ng  splines. 


Another  issue  in  the  analysis  of  time-to-event  data  is  the  modeling  of  multiple,  or  re¬ 
current,  outcomes.  The  problem  of  modeling  multiple,  or  recurrent,  time  to  event  data  has 
received  considerable  attention  in  the  statistical  literature.  For  multiple  outcome  data,  Wei, 
Lin  and  Weissfeld  (1989)  propose  the  use  of  marginal  modeling.  For  the  analysis  of  recurrent 
event  data,  Prentice,  Williams  and  Peterson  (1981)  propose  the  use  of  conditional  models, 
Andersen  and  Gill  (1982)  propose  a  modification  of  the  proportional  hazards  model  and  Wei, 
Lin  and  Weissfeld  (1989)  apply  the  marginal  approach  for  modeling  such  data.  However, 
these  methods  have  not  been  extended  to  include  flexible  and  possibly  nonlinear  effects  of 
prognostic  factors.  On  the  other  hand,  many  researchers  have  demonstrated  that  important 
prognostic  factors  (e.g.  BMI)  have  a  markedly  non-linear  effect  on  breast  cancer  survival 
and/or  prognosis  (Gray,  1994).  These  methods,  however,  are  limited  to  single  outcomes  and 
do  not  lend  themselves  to  simultaneous  inference  of  several  time-to-event  outcomes. 

In  this  article,  we  extend  the  marginal  models  of  Wei,  Lin  and  Weissfeld  (1989)  to  allow 
modeling  flexibility  via  the  use  of  penalized  B-splines  in  the  style  of  Gray  (1994).  See  also 
Hastie  (1996)  for  a  detailed  discussion  on  a  more  general  class  of  pseudo-smoothers.  The  re¬ 
mainder  of  the  paper  is  organized  as  follows.  In  §2,  we  give  background  material  on  penalized 
B-splines  and  details  on  the  proposed  flexible  marginal  models.  In  §3,  we  perform  extensive 
simulation  studies  to  study  the  small  sample  properties  of  the  proposed  inferential  proce¬ 
dures.  §4  summarizes  the  results  from  applications  of  the  proposed  methodology  to  data  from 
the  NSABP  Breast  Cancer  Prevention  Trial  (BCPT),  known  as  Protocol  B-14,  comparing 
tamoxifen  to  placebo  for  the  prevention  of  recurrence  in  subjects  with  breast  cancer.  In  §5, 
we  summarize  the  main  results  and  give  details  on  future  directions  for  research.  The  details 
on  the  theoretical  development  and  asymptotic  properties  of  the  inferential  procedures  are 
given  in  the  Appendix  (?).  Are  we  still  planning  to  do  this? 
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2  Proposed  Model 


2.1  Background 

To  fix  ideas,  we  first  consider  a  non-parametric  regression  model  in  the  univariate  framework. 
Let  (xi,  yi), {xn,  Vn)  denote  a  set  of  n  independent  observations  and  consider  a  regression 
model  of  the  form 


Vi  =  fi^i)  +  €i  ,  (1) 

where  i  =  f{x)  is  an  unspecified  smooth  function  and  Cj  G  N{0,a‘^).  In  the  non- 

parametric  regression  setup,  one  estimates  f{x)  via  a  scatterplot  smoother.  A  scatterplot 
smoother  is  said  to  be  linear  if,  concentrating  on  the  computations  of  the  function  only  at 
the  design  points  in  x  =  (xi,  ...,x„),  it  can  be  written  as  a  linear  map  S  :  HP  defined 

by  y  =  Sy,  where  y  =  (yi,  ■■;yn)  is  the  response  vector.  Here  S  is  referred  to  as  a  smoother 
matrix  and  is  analogous  to  the  hat  matrix  in  linear  regression.  From  this  point  onwards,  our 
discussion  focuses  on  penalized  regression  splines,  even  though  the  idea  of  pseudo-smoothers 
applies,  in  principle,  to  any  linear  smoother. 

For  a  given  number  of  knots  and  fixed  positions  of  the  knots,  a  regression  spline  repre¬ 
sentation  that  uses  the  B-spline  basis  functions  Bi(x), ...,  Bm+4{x)  is  given  as 

Tn+3 

f{x)  =  7o  +  ■ 

1=2 

Note  that  the  constant  and  linear  functions  are  stated  explicitly  and  only  (m-f-2)  of  the 
B-spline  basis  functions  are  used  for  identifiability  (De  Boor,  1974).  A  penalized  form  of 
this  B-spline  representation  is  given  by  subtracting  the  following  roughness  penalty  from  the 
resulting  residual  sum  of  squares: 
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A  J  If'iufdu  . 

Here,  A  is  a  smoothing  parameter  that  determines  the  amount  of  smoothness.  Recog¬ 
nizing  that  the  penalty  function  given  above  is  quadratic  in  the  parameter  vector  7  = 
(70,71,  ...,7^+3),  one  could  rewrite  it  as 

A7^K7  , 

where  K  is  a  positive  defiite  matrix  that  is  a  function  of  the  covariate,  and  more  specifically, 
of  the  knot  locations.  Note  that  K  is  an  (m  -|-  4)  x  (m  +  4)  matrix  with  the  the  first  two 
rows  and  two  columns  as  zeros,  since  the  constant  and  linear  functions  pass  unpenalized. 

This  idea  was  first  introduced  in  Hastie  and  Tibshirani  (1990)  and  its  use  in  univariate 
proportional  hazards  models  was  detailed  in  Gray  (1994).  Gray  (1994)  also  develops  (and 
validates)  appropriate  testing  procedures  for  main  effects,  interactions  and  non-linear  time 
depenency  of  covariate  effects  for  the  proportional  hazards  model  (any  more  details  here?). 
In  this  paper,  we  extend  this  technology  to  the  multivariate  proprtional  hazards  models  of 
Wei,  Lin  and  Weissfeld  (1989). 

2.2  The  model 

To  model  marginal  distributions  of  multivariate  time-to-event  data,  let  us  consider  a  fiexible 
proportional  hazards  model  for  each  of  the  G  failure  types.  For  the  type  of  failure  of  the 
i  =  1, ...,  n,  subject,  the  model  can  be  written  as 


Xgi{t)  =  Xgo{t)exp{J2  fjgiZjgi)}  ,  t>0  ,  (2) 

i 
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where  Xgo{t)  is  an  unspecified  baseline  hazard  function  and  fjg,  j  —  denotes  the 

unspecified  smooth  functions.  In  the  usual  setup  (Cox,  1972),  one  observes  data  of  the  form 
{Xgi,  Zgi,  Agi),  whoro  Xgi  =  min{Xgi,  Cgi),  Cgi  is  the  censoring  time,  Zgi{t)  =  {Zigi{t), ...,  Zpgi{t))'^ 
and  Agi  =  1  if  Xgi  =  Xgi  and  0  otherwise. 

Model  (2)  is  fully  non-parametric  and  quite  general.  Note  also  that  the  fully  linear  model 
of  Wei,  Lin  and  Weissfeld  (1989)  forms  a  special  case  of  (2)  where  fjg{Zjgi)  =  PjgZjgi.  For 
this  fully  linear  model,  the  partial  likelihood  is  given  as 


i^Lg^P)  exp{l3g^r)Zgi{Xgi)}) 

where  (3g  =  {fiig,  ...,f3pg)'^  and  TZgit)  =  {/  :  Xgi  >  t}  denotes  the  set  of  subjects  at  risk  just 
prior  to  time  t  with  respect  to  the  type  of  failure.  The  solution  to  dlogPLg{(3g)/d/3g  =  0, 
^g,  can  be  shown  to  be  a  consistent  estimator  of  f3g  provided  that  the  fully  linear  model  is 
correctly  specified  (Anderson  and  Gill,  1982). 

In  practical  applications,  the  effects  of  most  covariates  are  known  to  have  some  parametric 
form,  while  some  of  them  are  best  modeled  via  non-parametric  smoothers.  For  simplicity 
of  discussion,  we  first  discuss  a  model  with  p  parametric  and  an  additional  non-parametric 
term,  i.e., 

Xgi{t)  =  \gQ{t)exp{^  pjgZjgi  +  fg{hgi)}  ,  t>0  ,  (4) 

i 

where  j  —  l,...,p.  We  propose  to  estimate  fg{hgi)  using  the  penalized  regression  spline 
approach  discussed  in  §2.1,  i.e., 

mH-3 

fgihg)  =  Jlghg  +  llgBlg{hg)  .  (5) 

1=2 

Note  that,  we  have  now  dropped  the  constant  term  since  it  is  accounted  for  by  the  baseline 
hazard.  Following  the  notations  of  Gray  (1994),  let  7^  =  (7^2,  •••,  73(m+3))  and  rjg  —  (71^,  7^). 
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Then,  a  penalized  partial  likelihood  that  includes  a  penalty  function  to  allow  for  smoother 
alternatives  would  be  defined  as 

PLPgiPg,  rig)  =  P  L  g  {(3  g  ,  fig)  "  l  /  2  X  gT]  g  gTi  g  .  (G) 

where  K  is  a  positive  defiite  matrix  that  is  a  function  of  the  covariate  hg  as  in  §2.1.  Note 
that  K  is  an  (m  +  3)  x  (m  +  3)  matrix  with  the  the  first  row  and  column  as  zeros,  since  the 
linear  function  passes  unpenalized. 

The  hypotheses  of  interest  with  respect  to  the  smooth  function  are  then  7^  =  0  and 
rig  —  0,  representing  the  hypotheses  of  “no  effect”  and  “linear  effect”  respectively,  more 
details  here  on  summarized  version  of  Gray’s  tests  for  univariate  outcome 

It  is  straightforward  to  extend  this  model  to  allow  for  multiple,  say  q,  non-parametric 
terms.  In  this  case,  rig  would  be  a  bigger  vector  that  augments  contributions  from  the  basis 
functions  of  the  q  terms.  Here,  rig  =  (77^^  :  ...  :  rigg)  would  be  of  dimension  +  3)  x  1 

and  the  penalty  term  would  be  the  sum  of  the  q  penalty  functions  leading  to 

Pmf^g^  Vg)  =  PLgif^g,  rig)  -  1/2  KjV^gj^gjrigj  •  (7) 

i=i 

where  each  non-parametric  term  has  its  own  smoothing  parameter,  Xgj,  and  penalty  func¬ 
tion  Kgj.  Here,  one  could  test  for  the  “overall”  effect  or  “linearity”  of  the  individual  non- 
parametric  terms  or  for  a  combination  of  them,  more  details  here 

2.3  Inference 

While  making  inference  on  each  of  the  margins  is  important,  this  could  be  done  easily  by  using 
developments  in  Gray  (1994).  Our  interest  is  mainly  in  being  able  to  conduct  simultaneaous 
inference  on  several  time-to-event  outcomes  in  models  that  have  non-parametric  smooth 
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terms.  Once  the  marginal  distributions  are  modeled,  then  the  methods  described  in  Wei, 
Lin  and  Weissfeld  (1989)  can  be  extended  to  test  for  trends  across  parameter  estimates  and 
to  combine  estimates  across  margins  to  test  for  covariate  effects  of  interest.  In  our  extensions 
to  the  multivariate  survival  data  framework,  we  use  slightly  diferent  but  equivalent  testing 
procedures  (compared  to  those  of  Gray  (1994))  for  both  the  univariate  (marginal)  and  the 
simultaneous  inferences.  Let  us  consider  the  case  where  we  have  p  parametric  terms  and  one 
additional  non-parametric  term  as  given  by  (4).  Then,  for  outcome  g,  the  unpenalized  part 
of  equation  (6)  can  be  written  as 


PI  (»  nl=n”  (  +  A 

where  all  components  are  as  defined  in  §2.2,  for  the  type  of  failure.  Let  ^pg  =  {/3g,  rjg) 
and  Pg  =  {Zig  :  ...  :  Zpg  :  hg  :  B2g{hg)  :  ...  :  Bm+3,g{hg))  with  Pgr  denoting  the  column 
vector,  r  =  1, ..., {m+p  +  3).  Letting  Ag  be  the  unpenalized  information  matrix  for  the  5*^ 
outcome  as  a  function  of  "0,  it  can  be  shown  that 

Vni'tj’g  -  'tpg(T))  =  n{Ag  +  XnK)-^n-P^Ug{'tpg^T))  +  Op(l) 

where  t4(0g(T))  is  the  score  vector  and  '0g(r)  is  the  vector  of  true  parameter  values  for  the 
g^^  outcome  (Gray,  1994).  Then,  it  follows  from  the  asymptotic  normality  of  f4('0p(T))  that 
i/n(0p  —  fpg^T))  is  asymptotically  normal  with  mean  0  and  variance  given  as  the  limit  of  nV 
where 


V={Ag  +  \rX)-^Ag{Ag  +  A„  K )  "  ^  , 


(9) 


To  develop  the  simultaneous  inferential  procedures  for  several  outcomes,  we  first  define 


Sf\i,^;X,i))  ’ 


(10) 


where 

n  m  m 

=  ^”^E^9iW(II^srO>))ea;p(X^Pgs'05,(i))  , 

i=l  j=l  5=1 


n  m 

i=l  s=l 

and  Ygi{t)  =  /(X^j  >  t).  Then,  the  asymptotic  covariance  matrix  between  \/j^('0u“'0u) 
—  tp^)  can  be  consistently  estimated  by 


Duvi'tpui  ^v)  =  Vui'lpJ'^uvi'^u^  -0 jKi  A)  )  (11) 

where  ^uvi'^u^'^v)  =  £^=i ^uji^vWvji'^vY ^  where  Wuj  and  Wyj  are  defined  in  (10). 

Thus,  the  covariance  matrix  of  (-01,  ...,0g)  can  be  consistently  estimated  by 


/  Al(01,0l) 


\  ^01(00.01) 


■Dig(0i,0g)  ^ 
■Dgg(0g.0g)  / 


(12) 


The  above  asymptotic  results  are  based  on  the  approach  used  in  Wei,  Lin  and  Weissfeld 
(1989).  Note  that  Q  is  constructed  as  a  function  of  the  information  matrix,  the  penalty 
matrix,  the  smoothing  parameter  and  the  individual  elements  of  the  score  vector,  that  is. 
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a  separate  term  is  computed  for  each  of  the  n  observations.  Note  also  that,  for  the  above 
approximation,  the  penalized  version  of  the  likelihood  is  used  to  compute  the  information 
matrix  while  the  original  (unpenalized)  version  of  the  likelihood  is  used  for  the  computation 
of  the  individual  elements  of  the  score  vector.  An  alternative  estimator  can  be  obtained  by 
using  the  penalized  version  of  the  likelihood  for  the  computation  of  W  as  given  in  equation 
(10)  Is  this  true?. 

Note  that  the  penalty  matrix  Kg  contributes  to  the  penalized  score  and  information 
matrix  only  for  the  last  (m  +  2)  components  of  'ipg.  The  inference  for  the  first  p  paramet¬ 
ric  terms  is  directly  analogous  to  Wei,  Lin  and  Weissfeld  (1989).  For  the  non-parametric 
term,  one  could  conduct  simultaneous  inference  on  the  “overall”  effect  and/or  “linear¬ 
ity”  of  h  across  failure  types.  Let  7^  denote  the  components  of  -0^  that  correspond  to 
the  relevant  components  of  the  non-parametric  term  hg.  Let  also  F  denote  the  relevant 
sub-matrix  of  Q  corresponding  to  7  =  (7i,...,7g)-  Then,  one  could  use  the  quadratic 
form  (7i,  ••■,7G)r(7i,  •••,7g)^  ^0  conduct  a  joint  test  on  the  null  hypotheses  given  by 

=  0,  g  —  Note  that  the  tests  for  “overall”  significance  or  “linearity” 

are  done  in  the  above  setup  by  choosing  the  last  (m  4-  3)  and  (m  4-  2)  elements  of  0^ 
respectively. 

Test  for  trends?  Is  it  possible  in  the  penalized  B-spline  framework?  This  could  probably 
be  the  advantage  of  pseudosplines  since  they  have  ordered  levels  of  complexity  and  hence  one 
could  test  for  equality  in  the  comparable  components  of  the  smooth  functions. 

In  the  above  setup,  we  assume  that  the  amount  of  smoothing  {i.e.,  the  value  of  the 
smoothing  parameter)  is  fixed  by  the  analyst  via  prior  knowledge  or  through  a  grid  search. 
It  is  also  possible  that  one  could  develop  automaitc  procedures  for  selecting  the  number  and 
position  of  the  knots  (which  are  usually  between  10-15,  per  outcome)  and  the  value  of  ag. 
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We  will  disucuss  the  potential  effects  of  various  choices  of  number  of  knots  in  our  simulation 
studies.  We  follow  Gray  (1994)  in  putting  the  knots  at  locations  that  yield  approximately 
equal  numbers  of  observations  between  knots.  The  issue  of  the  value  of  the  smoothing 
paramters  could  also  be  addressed  as  a  model  selection  procedure.  But,  we  do  not  pursue 
this  issue  any  further  in  this  manuscript.  We,  however,  intend  to  report  results  elsewhere  Is 
this  enough  or  the  right  strategy?. 

3  Simulation  Study 

Initial  details  as  in  the  outline? 


4  Examples:  The  NSABP  Data 

Initial  details  as  in  the  two  substantive  papers  from  Joe  Costantino? 


5  Discussion 

•  summarize  main  results  and  findings 

•  relevance  to  breast  cancer  research 

•  discuss  related  research  and  open  areas  of  research 
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